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EXPLICIT  SOLUTIONS  FOR  SOME  SIMPLE 
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Abstract 

We  consider  a  decentralized  detection  problem  in  which  a  number  of  identical  sensors  transmit  a 
finite-valued  function  of  their  observations  to  a  fusion  center  which  makes  a  final  decision  on  one  of 
M  alternative  hypotheses.  We  consider  the  case  where  the  number  of  sensors  is  large  and  we  derive 
(asymptotically)  optimal  rules  for  determining  the  messages  of  the  sensors,  for  the  case  where  the 
observations  are  generated  from  a  simple  and  symmetrical  set  of  discrete  distributions.  We  also 
consider  the  tradeoff  between  the  number  of  sensors  and  the  communication  rate  of  each  sensor 
when  there  is  a  constraint  on  the  total  communication  rate  from  the  sensors  to  the  fusion  center. 
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1.  PROBLEM  DEFINITION 


The  decentralized  detection  problem  is  defined  as  follows.  There  are  M  >  2  hypotheses  Hx , 
Hi,...,  Hu  with  known  a  priori  probabilities  P[H})  >  0  and  N  sensors.  Each  sensor  i  obtains  an 
observation  y, ,  where  y,  is  a  random  variable  taking  values  in  a  set  Y.  We  assume  that  yi , . . . ,  y* 
are  conditionally  independent  (given  the  true  hypothesis)  and  identically  distributed  with  known 
conditional  distributions  Py  (•  |  H}).  Each  sensor  t  evaluates  a  D-valued  message  u,  €  {1, . . . , D), 
as  a  function  of  its  observation,  and  transmits  it  to  a  fusion  center.  Finally,  the  fusion  center 
declares  one  of  the  alternative  hypotheses  to  be  true  (Fig.  1). 

Let  7i  :  Y  (1, . . . ,  D},  i  =  1, 2, . . . ,  N,  be  the  function  (to  be  called  a  decision  rule)  used  by  the 
ith  sensor  to  determine  its  message  u,  ;  that  is,  u,  =  7,  (y, ) .  Let  Uq  G  (1, . . . ,  Af }  be  the  decision  of 
the  fusion  center.  This  decision  is  made  according  to  a  decision  rule  70  :  (1,...,D}N  •->  {l,...,Af}; 
that  is,  u0  =  7o(ui,. . .  ,uN).  We  say  that  the  fusion  center  makes  an  error  if  «o  =  «  and  H,  is 
not  the  true  hypothesis.  The  probability  of  error  is  completely  determined  by  the  statistics  of  the 
observations  and  by  the  decision  rules  7o,7i,  •  •  •  ,7*;  it  will  be  denoted  by  JN  (70, . . .  ,7at).  Our 
problem  is  to  choose  the  decision  rules  70 , 7i ,  ■  •  • ,  7jv  of  the  sensors  and  of  the  fusion  center  so  as 
to  minimize  the  probability  of  error. 

The  above  described  problem  and  its  variations  have  attracted  substantial  interest  [TeSSl], 
[KuP82],  [EkT82],  [Tsi84] ,  [TeV84],  [TsA85],  [PaA86],  [H0V86],  [ChV86],  [Sad86],  [Sri86a],  [Sri86b], 
[ReN87a],  [ReN87b],  [TVB87].  It  was  first  introduced  in  [TeS8l]  for  the  case  of  two  hypotheses 
( M  =  2),  two  sensors  (N  —2),  binary  messages  (D  =  2),  and  for  a  fixed  choice  of  the  fusion  center’s 
decision  rule  70.  It  was  shown  in  [TeS8l]  that  under  the  conditional  independence  assumption,  each 
sensor  should  evaluate  its  message  u,  using  a  likelihood  ratio  test  with  an  appropriate  threshold. 
(This  conclusion  is  not  valid  if  the  conditional  independence  assumption  is  removed  in  which  case  the 
problem  becomes  computationally  intractable  (TsA85].)  The  optimal  thresholds  in  the  likelihood 
ratio  tests  of  the  different  sensors  can  be  obtained  by  solving  a  system  of  nonlinear  equations.  It 
is  important  to  emphasize  that  the  optimal  decision  rules  for  the  decentralized  problem  are  not, 
in  genera],  the  same  as  those  that  would  be  derived  using  the  classical  theory,  independently  for 
each  sensor  This  is  because  the  optimal  decision  rules  are  chosen  so  as  to  optimize  systemwide 
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Figure  1 
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performance,  as  opposed  to  the  performance  of  each  individual  sensor. 

The  performance  of  a  decentralized  detection  system  is  generally  inferior  to  that  of  a  centralized 
system  in  which  all  raw  data  available  are  transmitted  to  the  fusion  center,  due  to  the  loss  of 
information  in  the  local  processing.  However,  decentralised  detection  is  often  more  practical  due 
to  the  reduction  of  the  communication  requirements,  as  well  as  because  the  processing  of  the  data 
is  shared  by  a  number  of  different  processors.  On  the  other  hand,  decentralized  detection  problems 
are  qualitatively  different  and  much  more  difficult  than  the  corresponding  centralized  detection 
problems.  For  this  reason,  there  are  very  few  such  problems  that  have  been  solved  analytically.  In 
fact,  most  of  the  theoretical  research  available  is  limited  to  the  derivation  of  necessary  conditions 
for  optimality,  and  these  can  only  be  solved  numerically.  In  contrast,  in  this  paper,  we  identify  a 
special  case  for  which  an  explicit  solution  can  be  obtained  analytically. 

We  now  define  the  particular  problem  to  be  studied.  We  assume  that  there  is  a  one-to-one 
correspondence  between  observations  and  hypotheses  and,  more  specifically,  Y  =  {1,. . .  ,M}.  We 
assume  that  the  conditional  distribution  of  the  observation  y  of  any  sensor  is  given  by 

where  f  is  a  scalar  satisfying  0  <  e  <  1  /{M  -  1).  In  other  words,  the  observation  of  a  sensor 
indicates  the  true  hypothesis  with  probability  1  -  (M  -  l)c,  or  it  indicates  a  false  hypothesis  in 
which  case  each  one  of  the  false  hypotheses  is  equally  likely  (probability  e).  Furthermore,  we  assume 
that  the  number  of  sensors  is  large  and  we  will  be  looking  for  an  asymptotic  solution,  as  N  — ♦  oo. 

Our  model  is  undoubtedly  too  structured  to  be  an  exact  representation  of  a  realistic  problem, 
the  main  drawback  being  the  assumption  that  there  is  a  one-to-one  correspondence  between  hy¬ 
potheses  and  possible  observations.  This  assumption  becomes  fairly  reasonable,  however,  in  the 
following  situation  (see  Fig.  2).  Each  sensor  i  receives  some  observations  z*  that  it  processes  in 
some  predetermined  way,  and  comes  up  with  a  preliminary  decision  y<  €  {1, . . . ,  M}  on  the  identity 
of  the  true  hypothesis.  Then,  each  sensor  t  transmits  to  the  fusion  center  a  function  7, (y, )  of  its 
preliminary  decision  y, .  Notice  that  we  are  restricting  here  the  message  to  be  a  function  ot  the 
processed  observations  instead  of  the  raw  observations.  While  such  a  restriction  may  result  to  some 
loss  of  performance,  it  is  quite  natural  in  certain  contexts,  especially  if  each  sensor  has  a  reason  to 
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come  up  with  a  preliminary  decision  in  a  timely  manner. 

The  above  discussion  notwithstanding,  our  interest  in  this  particular  problem  arises  mainly 
from  the  fact  that  an  explicit  solution  can  be  obtained,  as  will  be  demonstrated  in  the  sequel. 
Furthermore,  the  solution  to  be  derived  provides  insights  and  intuition  on  the  nature  of  optimal 
solutions  to  more  general  problems  for  which  explicit  solutions  are  not  possible.  Such  insights  are 
very  valuable  because  they  can  suggest  interesting  numerical  experiments  and  heuristic  guidelines 
for  coping  with  more  difficult  problems. 

The  remainder  of  this  paper  is  organized  as  follows.  In  Section  2,  we  outline  some  results  from 
[Tsi88]  that  will  be  needed  later.  In  Section  3,  we  introduce  some  notation  and  terminology,  and 
some  simple  preliminary  facts.  In  Section  4,  a  complete  solution  is  derived  for  the  c cse  wnere  the 
noise  parameter  e  is  small  and  the  number  of  sensors  is  large.  In  Section  5,  we  provide  a  partial 
extension  of  the  results  of  Section  4  to  the  case  of  a  general  noise  parameter  e.  Finally,  in  Section 
6,  we  study  the  tradeoff  between  the  number  of  sensors  and  the  communication  rate  of  each  sensor 
when  there  is  a  constraint  on  the  total  communication  rate  from  the  sensors  to  the  fusion  center. 

2.  BACKGROUND. 

As  mentioned  in  the  introduction,  we  will  be  looking  for  an  asymptotic  solution  to  our  problem, 
as  the  number  of  sensors  N  becomes  very  large.  The  basic  theory  concerning  such  an  asymptotic 
solution  has  been  developed  in  [Tsi88]  and  we  review  here  the  facts  that  will  be  needed.  Some  ex¬ 
perimentation  [Pol88]  has  shown  that  the  asymptotically  optimal  decision  rules  perform  reasonably 
well  for  moderate  numbers  of  sensors. 

We  use  T  to  denote  the  set  of  all  possible  decision  rules.  Due  to  the  finiteness  of  the  observation 
set  V  and  of  the  message  set  {1, . . . ,  D},  it  is  seen  that  the  set  T  is  also  finite.  We  introduce  the 
shorthand  notation  7 N  to  denote  a  possible  choice  (70  > 7i ,  •  •  • ,  In  )  of  decision  rules  for  the  W-sensor 
problem.  With  a  reasonable  choice  of  7* ,  the  probability  of  error  Js  (7* )  converges  exponentially 
to  zero  as  N  increases.  For  this  reason,  we  focus  on  the  exponent  of  the  error  probability,  defined 
by 

’.W) (1) 
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Let  RN  =  inf1K  fjv( 7*),  where  the  infimum  is  taken  over  all  possible  choices  of  decision  rules  for 
the  JV-sensor  problem.  Thus,  RN  is  the  optimal  exponent.  As  N  tends  to  infinity,  RN  has  a  limit 
[Tsi88j  which  will  be  denoted  by  A* .  In  the  sequel,  we  will  be  concerned  with  choosing  the  decision 
rules  so  that  the  corresponding  error  exponent  approaches  the  optimal  exponent  A*. 

Consider  a  sensor  that  uses  a  particular  decision  rule  7  €  I\  Conditioned  on  Hi ,  the  probability 
that  the  transmitted  message  takes  a  particular  value  d  €  {1  is  given  by  Pr(7(y)  =  d  \  Hi) . 

For  every  i,j  €  {1, . . . ,  M)  and  every  decision  rule  7  6  I\  we  define  a  function  (7»  *)  :  [0>  1] 
[-00,  +00)  by 


'  D 

Mu(7, «)  =  lo8  (Pr('l(v)  =  d  I  H>))  (Pr(l(v)  =  d  I  Hi)) 

A=\ 


(2) 


(The  convention  0°  =  0  is  used  in  this  formula.)  It  is  easily  verified  that  (7,  a)  <  0  for  every  t,  j, 
7  G  T,  a  e  [0,1],  and  it  is  also  known  that  ^,(7, a)  is  a  convex  function  of  s,  for  every  »,/,  7  6  T 
(SGB67).  Furthermore,  as  long  as  there  exists  some  y  e  Y  such  that  PY  (y  |  Hi)  •  PY  (y  |  H}  )  0, 

then  Hij( 7,  s)  >  -00,  for  every  a  €  [0,1].  This  turns  out  to  be  always  the  case  for  our  problem 
except  for  the  uninteresting  situation  where  M  =  2  and  e  =  1. 


The  optimal  exponent  is  given  by  [Tsi88] 


A*  = 


min  max  min 
{*7b€r}  {(»,/)!•*/}»€ jo.ij 


E 

•rer 


*7/^(7,*), 


(3) 


where  the  outer  minimization  is  carried  out  over  all  choices  of  {z7  |  7  €  T}  satisfying  x1  >  0  for 
all  7  €  T,  and  £27er  *7  =  1-  In  the  sequel,  we  use  x  to  denote  a  vector  {x7  |  7  e  T}.  Furthermore, 
we  use  X  to  denote  the  set  of  all  such  vectors  which  satisfy  the  constraints  just  stated. 


The  variable  z7  in  Eq.  (3)  should  be  interpreted  as  the  fraction  of  the  sensors  that  use  decision 
rule  7.  More  specifically,  let  us  fix  some  *  G  X.  For  each  7  €  T,  let  [Hx1\  sensors  use  decision  rule 
7.  (If  for  some  7  the  value  of  Nx1  is  not  integer  this  determines  the  decision  rules  for  fewer  than 
N  sensors.  However,  the  remaining  sensors  constitute  a  vanishingly  small  fraction  of  the  total,  as 
N  — ►  00,  and  are  inconsequential.)  Then,  the  asymptotic  exponent  (as  N  -*  00)  of  the  probability 
of  error  is  given  by  [Tsi88] 

max  min  V]  z7pi,(7,s).  (4) 
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In  particular,  if  the  fractions  x7  are  chosen  to  minimise  the  exponent  in  Eq.  (4),  then  the  optimal 
exponent  A*  is  obtained  [compare  with  Eq.  (3)].  Notice  that  the  problem  formulation  has  taken  a 
somewhat  different,  but  equivalent,  form:  instead  of  choosing  the  decision  rule  of  each  sensor,  we 
are  now  trying  to  choose  the  fraction  x7  of  the  sensors  that  use  a  given  decision  rule  7  £  T. 

Equation  (4)  has  a  simple  interpretation.  The  quantity  mint€f0,i|  £7€rx (*!,*)  “  the  ex¬ 
ponent  in  the  Chernoff  bound  for  the  probability  of  confusing  hypotheses  Hi  and  Hi  ([VaT68], 
[SGB67]),  and  such  a  bound  is  known  to  be  asymptotically  tight.  The  maximization  over  all  t  and 
j  in  Eq  (4)  corresponds  to  the  fact  that  the  dominant  term  in  the  probability  of  error  comes  from 
the  worst  (i.e.,  the  largest)  of  the  exponents  corresponding  to  the  different  pairs. 

The  outer  minimization  in  Eq.  (3)  appears  to  be  simple  because  it  involves  linear  constraints  and 
a  cost  function  which  is  linear  in  the  variables  x7 .  However,  the  inner  minimization  (with  respect 
to  s)  severely  complicates  the  computation  of  A*  and  of  the  optimal  values  of  the  variables  x7 .  In 
the  next  two  sections,  we  get  around  this  difficulty  by  exploiting  the  symmetry  of  the  problem  to 
remove  the  dependence  on  a. 

3.  PRELIMINARIES. 

Consider  a  decision  rule  7  :  Y  *-♦  {1, . . . ,  D)  and  let  Y, if,7  =  (y  |  7 (y)  =  d).  We  notice  that  the 
sets  Ydn ,  d  =  1, . . . ,  D,  are  disjoint  and  their  union  equals  Y .  Thus,  a  decision  rule  determines  a 
partition  of  Y  into  D  disjoint  sets.  It  is  possible  that  two  different  functions  7  :Y  >-*  {1,. . . ,  D) 
and  7*  :  Y  *-*  {l,...,Z>}  determine  the  same  partition.  [For  example,  consider  the  case  where 
Y(v)  =  D  +  1  -  7(y).]  On  the  other  hand,  if  7  and  7*  determine  the  same  partition,  then  each 
one  of  the  messages  j(y)  and  7 '(y)  conveys  the  same  information  to  the  fusion  center,  and  the 
two  decision  rules  can  be  considered  equivalent.  From  now  on,  we  will  not  distinguish  between 
equivalent  decision  rules  and  we  will  consider  them  to  be  identical.  We  are  therefore  adopting  the 
alternative  definition  that  a  decision  rule  is  a  partition  of  Y  into  subsets  Yji7,. . .  ,Vx>.,.  We  assume 
that  the  sets  Yd,y  are  arranged  in  order  of  increasing  cardinality;  that  is,  |yx,7  j  <  •••  <  |Vz>>7 1. 

Definition:  Two  observations  i,j  G  Y  are  separated  by  a  decision  rule  7  if  t  and  j  belong  to 
different  elements  7  of  the  partition  corresponding  to  7.  We  let  T,,  be  the  set  of  all  7  €  T  that 
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separate  i  and  j.  The  number  of  separation »  corresponding  to  a  decision  rule  7  is  defined  as  the 
number  of  (unordered)  pairs  of  observations  i,j  €  Y  which  are  separated  by  7. 


Notice  that  an  M-ary  hypothesis  testing  problem  can  be  viewed  as  a  collection  of  several  binary 
hypothesis  testing  problems,  one  for  each  pair  of  hypotheses.  The  number  of  separations  corre¬ 
sponding  to  a  decision  rule  7  can  be  interpreted  as  the  number  of  binary  problems  for  which  a 
message  7(y*)  provides  useful  information. 

Definition:  Let  &i,.  ■  ■  ,$d  be  a  collection  of  nonnegative  integers  satisfying  6X  <  $2  <  •  •  •  <  6D 
and  J2d=i  =  M.  The  clast  CSl . 6°  is  the  set  of  all  7  €  T  such  that  \Yd,-,  |  =  6d  for  every  d. 

These  definitions  are  illustrated  in  Fig.  3. 

Let  L  be  the  number  of  different  classes.  In  order  to  facilitate  notation,  we  assume  that  the 
different  classes  have  been  arranged  according  to  some  arbitrary  order  and  we  will  use  the  simpler 
notation  Ct  to  denote  the  /th  class,  t  =  1  ,...,£.  Thus,  the  set  T  of  all  decision  rules  is  equal  to 

u*= 1 Q  ■ 

It  is  seen  that  the  number  of  separations  is  the  same  for  all  decision  rules  belonging  to  the  same 
class  Ct  [see  Fig.  3],  and  will  be  denoted  by  St.  In  particular, 

=  (5) 

d=  1 

where  6X  are  such  that  Q  —  C6> . .  [The  factor  1/2  in  Eq.  (5)  is  present  because 

otherwise  each  unordered  pair  would  be  counted  twice.] 

Let  Qi  be  the  cardinality  of  the  set  of  all  triples  (t ,  j,  7)  such  that  7  €  Ct  and  7  separates  i  and 
j.  [The  two  triples  (s',  j,  7)  and  (j,  s',  7)  are  only  counted  once.]  Since  the  number  of  separations 
corresponding  to  any  7  €  Q  is  St,  we  see  that  Qe  =  |C*|  •  Si.  On  the  other  hand,  every  pair  («,  j) 
is  separated  by  exactly  |C<  n  l  elements  of  Ct.  By  symmetry,  the  cardinality  of  Ct  n  r<,  is  the 
same  for  every  i  and  j.  Furthermore,  since  there  exist  M(M  -  l)/2  different  (unordered)  pairs 
(«,j),  we  conclude  that  Qt  =  \Ct  nr,,-|  •  M[M  -  l)/2.  By  equating  the  two  alternative  expressions 
for  Qe,  we  obtain 

IQ  nr,-, I  =  2 st 

IQ  1 

7 


(6) 


a  fact  that  will  be  useful  later. 


We  now  derive  the  form  of  the  functions  A*»y(7>  *)•  Suppose  that  *  en,  and  j  £  Kf  t1 .  Using 
the  notation  6,,  =  | |  and  S(  =  \Y(^  j,  it  is  seen  [cf.  Eq.  (2)]  that 

Mw(7,«)  =  log  [(1  -  (M  -  6n)Cy-(6ntY  +  -  (A/  -  *<)<)'  +  (M  -  -  «f)cl, 

J  (7) 

if #  f, 


Mi>(7.*)  =  0,  if  7  =  f-  (8) 

Notice  that  the  case  fj  =  f  [cf.  Eq.  (8)]  corresponds  to  the  case  7  $  T.y .  Finally,  from  either  Eq. 
(2)  or  Eq.  (7),  it  is  seen  that 

MO  (7,  ®)  =  P><(7,1  -  *).  (9) 

which  will  be  useful  later. 


4.  THE  SMALL  NOISE  CASE. 

In  this  section,  we  derive  the  solution  of  the  problem  under  consideration  for  the  case  where  the 
noise  parameter  e  is  small.  This  is  accomplished  by  showing  that  the  minimum  with  respect  to  a 
in  Eq.  (3)  is  approximately  attained  for  a  =  1/2,  which  allows  us  to  eliminate  s. 

Lemma  1:  Fix  some  e0  such  that  0  <  e0  <  1/(M  -  1).  Then,  there  exist  constants  G j  and  G2 
such  that,  for  every  e  €  (0,  t0),  every  i,j  £  {1, . . . ,  M }  such  that  i  ^  j,  and  every  x  £  X,  we  have 

Gi  +  ^logc  £  *,  <  rmn  (7,*, «)  <  G2  +  ^loge  ^ 

i€r.,  *  '  'ier  ,6r„ 


Proof:  We  first  prove  the  right-hand  side  inequality.  Consider  some  7  £  r,;  and  suppose  that 
i  €  y„,, ,  j  £  Y(1 .  We  have  [cf.  Eq.  (7)] 

^•,(l-1/2)  =(1  -  [M  -  6n)()1,2(6nf)1/3  +  (6f e)1^3  (l  -  (M  -  5()e)1/3  +  (M  -  6„  -  St)e 
<{^V2  +  (*f<)1/2  +  (*-*,-  *f)e>  '2  <  ff3<1/3, 

where  ^2  =  6^2 +6^2  +  M -6V -6f  >  1.  Taking  logarithms,  we  obtain  m«j(7>1/2)  <  G2  +  (log e)/2, 
where  G2  =  logtf2  >  0.  Furthermore,  if  7  £  r,y,  we  have  ^(7, 1/2)  =  0  [cf.  Eq.  (8)].  It  follows 
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that 


X  XiKii'h  l/2)  <  Ga  +  ilogc  X  z,. 

ier  i€r0 

If  a  minimization  over  a  is  carried  out,  the  resulting  value  is  no  larger  than  the  one  corresponding 
to  a  =  1/2,  and  this  proves  the  right-hand  side  inequality. 

We  now  prove  the  left-hand  side  inequality.  We  fix  some  i,  j,  some  7  €  I\y,  and  some  a  €  (0,1/2]. 
We  assume  again  that  i  €  Yi,.,  and  j  €  Y(n .  We  have 

=(!  -  (M  -  (1  -  (M  -  «,)e)*  +(M-6„- 6{)e 

>(1 -{M -*,)€)'“  W  >  (1  -(M-*„)£)e*  >  (l  -  (M-  l)e0)c*  =  Hxe‘  >  Hxc1^2 , 

where  Hx  =  1  -  {M  -  1)«0  >  0.  Taking  logarithms,  we  obtain  ^,(7,  s)  >  Gx  +  (loge)/2,  where 
Gx  =  log  Hx  <  0.  The  same  conclusion  is  obtained  by  a  symmetrical  argument  for  the  case 
a  €  [1/2, 1].  Using  again  the  fact  that  Mvy (7, *)  =  0  if  7  £  I\y,  we  obtain 


min 
*€  (0,1 


^z1/i<J  (7,«)  >  X*i 

ier  ter 


min  u.>(7.«)>  X)  *i(^1oKc  +  gi)  >Gi  +  ^l°gf  XI  *■»» 

<e|0,11  icr,,  -,er„ 


which  completes  the  proof.  Q.E.D. 


We  notice  that  as  e  approaches  zero,  logc  tends  to  -00,  while  the  constants  GX,G2  of  Lemma  1 
remain  unchanged.  Therefore,  by  retaining  the  dominant  term,  A*  can  be  approximated,  for  small 
c,  by 


A*  = 


1  . 

-  min  max 
2  .ex 


loge  X 


*V 


(10) 


Since  log  t  is  negative,  an  equivalent  optimization  problem  is 


max  min  )  z, . 
•6X 


We  now  derive  the  solution  of  (11). 


(11) 


Proposition  1:  Let  5*  =  ma x*S*.  Then,  a  vector  z  €  X  is  an  optimal  solution  of  the  problem 
(11)  if  and  only  if  the  following  two  conditions  hold: 

(i)  The  value  of  .  zi  >B  the  same  for  every  pair  (i,j)  such  that  1  #  j. 

(ii)  If  7  €  Ct  and  St  <  S' ,  then  z,  =  0. 


9 


Furthermore,  the  optimal  value  of  (11)  is  equal  to  2S*/(M(M  -  1)). 

Proof:  Suppose  that  a  vector  x*  G  X  satisfies  conditions  (i)  and  (ii),  and  let  c  be  such  that 
c  =  EieriJ  >  f°r  *  ^  J-  Summing  over  all  unordered  pain  we  obtain 

-  1)  _  ^  ^  5*1*  —  5* 

2  {(ij)Mj)  7€r„  ier  {(ij)h6r„)  ner 

[Here  we  used  the  fact  that  if  7  €  Ct,  then  the  cardinality  of  the  set  {(t‘,j)  |  7  G  r,,}  is  St,  by 
definition;  we  then  used  property  (ii)  to  replace  St  by  S' .]  We  conclude  that  if  conditions  (i)  and 
(ii)  hold,  then  c  =  2 S*/(M(M  -  1)). 


In  order  to  show  that  the  vector  x*  is  actually  optimal,  it  is  sufficient  to  show  that 

25* 


min  ^  x«  ^  r  ..  \  j 

M(M  -  1) 

7  cl  it 


for  every  vector  x  €  X.  We  use  the  elementary  fact  that  the  minimum  of  a  Bet  of  numben  is  no 
larger  than  their  average,  to  obtain 
M(M  -  1) 


2  „.T,„  £  £  £  E*-=EE  £ 

i€t\,  {(<.y)|<#y)  -rer.,  t  -rec«  {(.,y)her„) 


=  ££s<*^s'££^  =  s‘' 
<  -»€C|  €  76C, 

as  desired.  We  conclude  that  x*  is  optimal. 


(12) 


For  the  converse,  let  us  suppose  that  a  vector  x  €  X  is  optimal.  We  have  already  established 
that  the  optimal  value  of  the  objective  function  under  consideration  is  equal  to  2 S*/(M(M  -  1)). 
Therefore,  all  inequalities  in  Eq.  (12)  must  be  equalities.  Since  the  first  inequality  in  Eq.  (12)  is 
not  strict,  condition  (i)  follows.  Furthermore,  since  the  second  inequality  in  Eq.  (12)  is  not  strict, 
condition  (ii)  follows.  Q.E.D. 


Using  Prop.  1,  one  optimal  solution  for  the  problem  (11)  is  the  following.  Choose  a  class  C*« 
such  that  Sr  =  S'  =  max*  5*  and  let 


*7 


{V 

l  |ct.| ' 


if  7  i  Ce , 
if  7  G  Cr  ■ 


(13) 


It  is  seen  that  this  vector  x  is  feasible  (x  G  X)  and  satisfies  the  optimality  conditions  of  Prop.  1. 
Let  us  point  out  that  an  optimal  solution  of  the  problem  (11)  is  in  general  not  unique.  The  solution 
provided  by  Eq.  (13)  can  be  singled  out  because  of  its  special  symmetry  properties. 


•  -4 

I 

i 

J 

% 

j 

| 

. 


•i 

i 

I 

< 

« 

\ 

--4 

a 
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a 


10 


The  class  Cr ,  which  is  a  class  of  decision  rules  with  a  maximal  number  of  separations,  should 
be  viewed  as  a  a  “best”  class:  according  to  Prop.  1  only  decision  rules  in  such  a  class  should  be 
used.  This  is  very  intuitive  because  each  7  €  C,  provides  information  to  the  fusion  center  which 
is  useful  in  discriminating  St  pairs  of  hypotheses  (by  the  definition  of  S().  The  larger  the  value 
of  St,  the  larger  the  contribution  of  a  decision  rule  7  €  S,  in  discriminating  between  the  different 
hypotheses. 

We  now  proceed  to  determine  the  best  class  Cr .  Suppose  that  Cr  =  C<1 . 4° ,  for  some  integer 

coefficients  Si,...,  So  whose  sum  is  equal  to  M.  Suppose  that  there  exist  some  tj  and  f  such  that 

Sn  -  6{  >  1.  Consider  a  new  class  Cv  —  Cix . 4» ,  where  -  1,  S\  —  6(  +  1,  and  S'd  =  Sd  if 

d^r/  and  d  ±  f .  Using  Eq.  (5),  we  obtain 

2(5,.  -St.)  =  S\(M  -  S;)  +  S'(M  ~  S')  -  S((Af  -  S()  -  6n(M  -  f„) 

=  (S(  +  1  )(M  -  S(  -  1)  +  (S„  -  1  )(M  -  5,  +  1)  -  6((M  -  S()  -  Sn(M  -  S„) 

=  2 (S„  -  S'  -  1)  >  0, 

which  contradicts  the  optimality  of  Sr  -  This  showB  that  \6n  -  6(  \  <  1  for  all  *7,  f .  Given  that  the 
average  of  the  coefficients  6d  must  be  equal  to  M/D ,  it  follows  that  for  every  d  we  must  have  either 
Sd  =  [M/D\  or  Sd  =  \ M/D].  In  particular,  if  M  is  divisible  by  D,  then  6d  —  M/D  for  every  d.  If 
M  is  not  divisible  by  D,  the  number  of  Sd  s  for  which  bd  —  [M/d\  is  uniquely  determined  by  the 
requirement  Yld=  1  bd  =  M. 

We  conclude  that  with  decision  rules  belonging  to  the  best  class  Cr ,  the  corresponding  partitions 
of  the  observation  set  Y  are  as  even  as  possible.  For  example,  if  D  =  2  and  M  is  even,  the  set  Y  is 
to  be  partitioned  into  two  subsets  with  equal  cardinalities.  Also,  for  the  example  of  Fig.  3  in  which 
M  —  5  and  D  =  2,  the  best  class  is  the  class  C3,3.  Notice  that  C3,3  has  10  different  elements;  thus, 
an  optimal  solution  is  to  divide  the  sensors  in  ten  groups  of  equal  cardinality  and  letting  all  the 
sensors  in  each  group  use  a  particular  decision  rule  belonging  to  the  class  C3,3. 

5.  THE  GENERAL  CASE. 

We  now  consider  the  case  where  e  does  not  tend  to  sero  but  is  fixed  instead  at  some  nonzero 
value  in  the  range  0  <  c  <  1  /(M  -  1).  Unfortunately,  despite  the  symmetry  of  the  optimization 


11 


problem  defining  A* ,  symmetry  considerations  alone  are  not  sufficient  to  ascertain  that  the  optimal 
value  of  the  vector  z  possesses  symmetry  properties  similar  to  the  ones  obtained  in  the  previous 
section.  We  demonstrate  this  by  means  of  a  simple  example.* 


Example:  Let  there  be  three  hypotheses  (AS  =  3)  and  let  the  messages  be  binary  ( D  =  2).  In 
this  case  there  are  exactly  three  decision  rules,  the  following:  the  ith  decision  rule  7,,  1  =  1,2,3,  is 
defined  by  Tr» (• )  =  1  and  7.(j)  =  2  if  j  #  i.  Notice  that  Mia(7s,»)  =  Mis(7a,«)  =  /i2s(7i>«)  =  0, 
for  every  a.  Let 

*(*)  =  log  [(1  -  2*)1"  V  +  (2e)1"*(l  -  e)*]  . 


It  is  seen  fcf.  Eq.  (2)]  that  M»;(7.,a)  =  »>(«)  and  /*,/  (7, ,  s)  =  v{\  -  *),  for  every  1  ^  j.  Substituting 
in  Eq.  (3),  and  using  the  notation  z<  =  zl4,  we  obtain 

A*  =  min  max]  min  (71,*)  +  z2/i12 (72,*)  +  *3^12(73,*)], 

*  £  X  v  #6  [0, 1] 

^mjn  [I|Pi3(7i,®)  +  *2/^13(72,*)  +  *3/115(73,*)], 

min  [xinn  (7 j ,  *)  +  *2m23  (72, »)  +  X3P23  (73, «)] } 

—  min  max  <  min  [zii/(«)  +  z2i/(l  -  «)], 

*ex  l.€|o,i|l  w  v 


min 

.€|0,l| 

min 

.€|0,1] 


+  *3l/(l  -  »)], 
[z2I/(«)  +  Z3l/(l  -  «)]}. 


Consider  the  symmetric  solution  (z,  =  1/3  for  each  »).  The  corresponding  exponent  is  seen  to 
be  *  min,€|0,i|[i'(s)  +  v(l  -  «)]  =  |»/(^).  (The  last  equality  follows  because  we  are  minimizing  a 
convex  function  which  is  symmetric  around  the  point  1/2.)  Let  us  now  consider  the  nonsymmetric 
solution  Zj  =  z2  =  5,  z3  =  0.  The  corresponding  exponent  is  equal  to 


In  particular,  if  jmin,€|0,i]  i'(a)  <  f  v(£),  then  the  symmetric  solution  is  not  optimal.  We  have 
investigated  this  issue  numerically  by  computing  the  value  of  the  exponent  corresponding  to  differ¬ 
ent  vectors  z  6  X  (over  a  fairly  dense  grid  of  points  in  X  and  for  a  few  different  values  of  e)  and 

*  This  example  also  corrects  an  error  in  a  corresponding  example  in  [Tsi88], 


we  have  reached  the  conclusion  that  the  symmetric  solution  is  always  the  optimal  one.  However, 
an  analytical  method  for  establishing  that  this  is  the  case  is  not  apparent,  even  though  it  can 
be  proved  that  the  symmetric  solution  is  a  strict  local  minimum.  (The  proof  of  the  latter  fact  is 
outlined  in  the  Appendix.) 


Without  any  guaranteed  symmetry  properties,  little  progress  can  be  made  analytically  towards 
the  computation  of  A*.  For  this  reason,  we  shall  impoie  a  symmetry  requirement  and  proceed  to 
solve  the  problem  of  Eq.  (3)  subject  to  this  additional  constraint.  Motivated  by  the  structure  of  an 
optimal  solution  for  the  low  noise  case  [cf.  Eq.  (13)},  we  require  that  the  value  of  z,  be  the  same 
for  every  7  belonging  to  the  same  class.  Given  any  vector  z  €  X  satisfying  this  requirement,  let 
yt  =  YItcc,  xt  We  then  have  z,  =  y*/|C*|  for  every  7  €  Ct.  Using  this  expression  for  z, ,  the 
minimization  problem  of  Eq.  (3)  becomes 


A*  =  min  max  min  Y''  r—  Y^  1^.(7,  s), 

V. . .6(0,11  ^  |C*|  ^  v  h 


(14) 


where  the  variables  yx , . . .  ,yL  are  subject  to  the  constraints  yt  >  0,  for  each  t,  and  y*  —  1. 


Proposition  2:  (a)  Fix  some  class  C*.  Then,  the  value  of 


1 

|c,nr.,| 


E 

1  €  Cgor  t  j 


^,(7,1/2) 


is  the  same  for  all  i,j  such  that  i  ^  j,  and  will  be  denoted  by  ct*. 

(b)  Let  l *  be  such  that  S*.  |a*«  |  =  max*  $*|a*|.  Then,  the  choice  yt •  =  1,  and  yt  =  0  if  /  ^  £*,  is 
an  optimal  solution  of  the  problem  (14). 

Proof:  (a)  This  is  evident  from  the  definition  of  p,,  ( 7, 1/2)  and  symmetry  considerations. 

(b)  Fix  some  pair  («,jf),  with  «  ^  j.  For  any  7  €  T,  define  a  new  decision  rule  <7(7)  in  which  the 
positions  of  t  and  j  in  the  partition  corresponding  to  7  are  interchanged  (see  Fig.  4).  It  is  seen 
that  a  is  a  one-to-one  and  onto  mapping  of  any  given  class  C*  into  itself.  Furthermore,  it  follows 
easily  from  the  definition  of  p,,  that  p,,  (<y(7),e)  =  fi,i (7, s)  =  p*,( 7, 1  -  •).  Therefore, 


J2  *)=!  E  *)  +  a/Wt),*)]  =  \  *)  +  t*A 7, i-*)]-  (15) 

•>€C  1  i  -t€Ct  ”>€ C, 
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Thus,  the  expression  in  the  left-hand  aide  of  Eq.  (15)  is  symmetric,  as  a  function  of  s,  around  the 
value  s  =  1/2.  It  follows  that  the  minimization  with  respect  to  s  in  Eq.  (14)  involves  a  function 
which  is  convex  and  symmetric  around  the  point  s  =  1/2.  Hence,  the  minimum  is  attained  at 
s  =  1/2  and  Eq.  (14)  simplifies  to 

A*  =  v  u  H  icT  H  Mt.  V2)-  (16) 

Vl . *4  Iwl 

Now,  using  part  (a)  of  the  proposition, 

lQi75/'j(',,1/2)  =  Mw(7’1/2)  =  1  fef^0<  =  (17) 

where  we  have  made  use  of  Eq.  (6)  in  the  last  step.  We  now  use  Eq.  (17)  to  further  simplify  Eq. 
(16)  and  obtain 

2  Vt 


A*  =  min  > 

tl  i . Ui  *  ^ 


-Stat. 


(18) 


vi . vi  —  1) 

Notice  that  the  inequality  at  <  0  holds  for  each  l.  Therefore,  an  optimal  solution  to  the  opti¬ 
mization  problem  of  Eq.  (18)  is  obtained  by  choosing  a  class  Cr  for  which  the  value  of  S*)a;|  is 
maximized  and  letting  yt*  =  1,  and  yt  =  0  if  /  ^  C .  Q.E.D. 


Our  conclusions  are  therefore  similar  to  the  small  noise  case.  In  particular,  there  exists  a  best 
class  and  all  decision  rules  to  be  used  should  belong  to  a  best  class.  The  nature  of  the  best  class  is 
interesting.  The  constant  at  can  be  interpreted  as  a  measure  of  the  contribution  of  an  “average” 
element  of  Ct  to  a  pair  of  hypotheses  which  are  separated  by  that  decision  rule  (see  Prop.  2(a)).  The 
product  S*|a<|  weighs  the  number  of  separations  of  a  decision  rule  in  Ct  by  the  “quality  measure” 
at  and  the  value  of  this  product  is  used  to  determine  a  best  class. 

The  identity  of  the  best  class  cannot  be  determined  analytically  because  the  formulas  for  the 
coefficients  at  are  somewhat  cumbersome.  On  the  other  hand,  for  any  given  value  of  e,  the  value 
of  at  is  easy  to  compute  numerically.  We  have  done  so  for  the  case  where  D  —  2  and  for  M  = 
5,10,20,30  [Poly88].  We  summarize  the  results.  When  c  is  very  small,  then  the  optimal  class 
is  the  one  which  partitions  evenly  the  observation  set,  in  agreement  with  the  results  of  Section 
4.  Interestingly  enough,  this  same  class  remains  optimal  for  larger  values  of  e  as  well,  up  to 
approximately  I/M.  At  about  that  point,  the  identity  of  the  optimal  class  changes,  and  the 
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optimal  class  is  a  most  uneven  one,  namely  the  class  Cl,u~1.  This  latter  class  remains  the  best 
one  for  all  e  up  to  1/(M  -  1)  (which  is  the  largest  allowed  value  for  e). 

The  case  f  =  1  /(M  -  1)  has  an  interesting  interpretation.  Here,  the  probability  Pr(y  =  t  |  H}) 
is  equal  to  e  if  t‘  ^  j,  and  is  zero  if  t  =  j.  Thus,  an  observation  y  =  f  provides  absolute  proof  that 
Hi  is  not  the  true  hypotheses.  If  the  sensors  use  decision  rules  7  £  C1W'‘  of  the  form  7(1)  =  1 
and  7 (j)  =  2,  for  j  /  *,  then  a  message  with  the  value  1  allows  the  fusion  center  to  eliminate  one 
of  the  hypotheses.  On  the  other  hand,  if  decision  rules  in  classes  other  than  C1W_1  are  used,  then 
the  fusion  center  is  not  able  to  make  unequivocal  inferences.  This  argument  suggests  that  Ci  u~l 
is  the  optimal  class,  as  confirmed  by  our  numerical  experiments. 

6.  DESIGN  OF  THE  OPTIMAL  COMMUNICATION  RATE  FOR  THE  SMALL 
NOISE  CASE. 

A  fundamental  design  problem  in  decentralized  decision  making  concerns  the  choice  of  the  com¬ 
munication  rate  (or  available  bandwidth)  between  the  different  decision  making  units.  Such  design 
problems  are  usually  very  hard  and  very  little  analysis  is  possible,  except  for  simple  situations.  For 
this  reason,  the  solution  of  even  idealized  problems  can  provide  valuable  intuition.  We  consider 
such  a  design  problem,  in  the  context  of  our  decentralized  detection  problem,  under  the  small  low 
noise  assumption. 

We  express  the  communication  rate  of  each  sensor  as  a  function  of  the  variable  D.  In  particular, 
we  view  the  number  flog3  D~\  as  the  number  of  binary  messages  that  each  sensor  must  transmit  to 
the  fusion  center* .  Clearly,  a  higher  value  of  D  leads  to  better  performance  (smaller  probability 
of  error  at  the  fusion  center)  since  a  decision  is  made  with  more  information.  On  the  other  hand, 
communication  resources  may  be  scarce,  in  which  case  an  upper  bound  can  be  imposed  on  the  total 

t  In  an  alternative  formulation  we  could  use  log3  D  instead  of  flog3  D].  Which  one  of  these 
choices  is  more  appropriate  could  depend  on  the  particular  coding  method  used  for  transmission. 
In  any  case,  our  subsequent  results  can  be  shown  to  remain  valid  under  this  alternative  formulation 


communication  rate  in  the  system.  Accordingly,  we  assume  that 


N\ic^D]<K,  (19) 

where  K  is  a  given  positive  integer.  Given  such  a  constraint,  we  pose  the  question:  “Is  it  better 
to  have  few  sensors  communicating  at  high  rate,  or  more  sensors  communicating  at  low  rate”?  We 
formulate  the  above  described  problem  in  mathematical  terms.  We  view  the  optimal  error  exponent 
A*  as  a  function  of  D  and  we  use  the  more  suggestive  notation  A *(D).  Furthermore,  we  consider 
the  small  noise  case  for  which  we  can  use  the  approximation  [cf.  Eq.  (10)  and  Prop.  1] 

where  [cf.  Eq.  (5)] 

5*(Z?)=  max  \  £>(M  -  *„),  (21) 

od  *  1 

d  —  1 

and  An  is  the  set  of  all  vectors  6  =  (6i, . . . , So)  such  that  each  64  is  a  nonnegative  integer  and 
~  Recall  that  the  error  probability  behaves,  asymptotically  as  N  — »  00,  like  eWA’,D). 
We  are  then  led  to  the  problem 

nun  NK‘(D)  (22) 

subject  to  the  constraint  (19).  (Of  course  N  and  D  are  also  constrained  to  be  an  integer  larger 
than  1.) 


Proposition  3:  An  optimal  solution  of  the  problem  defined  by  Eqs.  (19)  and  (22)  is  given  by 
D  —  2,  N  =  K. 

Proof:  We  use  Eqs.  (20)  and  (21)  and  the  fact  that  logc  is  negative  to  formulate  the  problem  (22) 
in  the  form 


max  NF(D), 


(23) 


where 


F(D)  =  -  Sd). 


(24) 


<1=  1 


Let  us  recall  that  the  optimization  problem  in  the  definition  of  F(D)  was  solved  in  the  end  of 


Section  4.  In  particular,  it  is  seen  that 


m= 


if  M  is  even, 
if  M  is  odd, 
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and 


^  =«*(»- 5). 

d=  1 

[The  above  inequality  is  obtained  because  —  •••  =  6D  —  M/D  is  the  optimal  solution  in  Eq.  (24) 
when  the  integrality  constraints  are  relaxed.] 

We  compare  the  solution  N  =  K,  D  =  2,  with  the  solution  N  =  [K/2\,  D  =  3.  It  is  easily 
verified  that 

*^*7**H)' 

which  shows  that  the  solution  with  D  =  2  is  preferable.  Similarly, 


K 


M7  -  1 
2 


VA/>  2, 


and  D  =  2  is  also  preferable  to  £)  =  4.  Finally,  if  D  >  4,  then  [loga  D]  >  3  and  N  <  Kj 3.  We 
have 

>  f  M2  >  j A#*(l  -  ^),  VM  >  2,  VD  >  4, 
and  D  =  2  is  again  preferable.  Q.E.D. 


Generally  speaking,  intuition  suggests  that  it  is  better  to  have  several  sensors  transmitting 
low  rate  but  independent  information,  rather  than  few  sensors  transmitting  detailed  information. 
The  above  result  corroborates  this  intuition,  at  least  for  the  particular  problem  under  study.  An 
alternative  statement  of  this  result,  which  is  pertinent  to  organizations  involving  human  decision 
makers,  is  the  following:  if  a  decision  maker  is  to  receive  a  set  of  reports  of  a  given  total  length,  it 
is  preferable  to  receive  many  partial  but  independently  drafted  reports,  rather  than  a  few  lengthy 
ones. 


7.  CONCLUSIONS 

We  have  considered  the  asymptotic  (as  the  number  of  sensors  goes  to  infinity)  solution  of  a  par¬ 
ticularly  simple  symmetric  problem  in  decentralized  detection.  While  the  problem  is  very  idealized, 
the  conclusions  obtained  agree  with  intuition  and  could  be  useful  as  guiding  principles  for  more 
general  problems.  Roughly  stated,  the  following  guidelines  suggest  themselves: 
a)  It  is  preferable  to  have  severed  independent  sensors  transmitting  low  rate  (coarse)  information 
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instead  of  few  sensors  transmitting  high  rate  (very  detailed)  information.  (Of  course,  this  guideline 
is  meaningful  if  it  is  assumed  that  the  addition  of  more  sensors  does  not  lead  to  increased  “setup” 
costs;  in  other  words,  it  is  assumed  that  many  sensors  are  readily  available  and  the  only  question 
is  whether  they  can  be  usefully  employed.) 

b)  An  Af-ary  hypothesis  testing  problem  can  be  viewed  as  a  collection  of  M(M  -  l)/2  binary 
hypothesis  testing  problem.  Under  this  point  of  view,  the  most  useful  messages  by  the  sensors 
(decision  rules)  are  those  which  provide  information  to  the  fusion  center  that  is  relevant  to  the 
largest  possible  number  of  binary  hypothesis  testing  problems. 

To  what  extent  the  above  two  guidelines  can  be  verified  analytically  or  experimentally  in  more 
realistic  problems  is  an  interesting  question  which  is  left  for  further  research. 

APPENDIX 


We  outline  here  a  proof  that  the  symmetric  solution  (x,  =  1/3,  for  t  =  1,2,3)  is  a  strict  local 
minimum  for  the  problem  considered  in  the  example  of  Section  5.  The  problem  under  consideration 
can  be  stated  as: 


where 


A*  =  minF(x), 

«€  X 


F(x)  =  maxF„(x), 
*<; 


(A.1) 


and 


Fiji*)  =  min  +  Xju(l  -  a)], 

•  €|0,1J 

where  i,j  €  {1,2,3}.  Let  x*  =  (1/3, 1/3, 1/3).  The  function  !/(•)  is  striclty  convex  and  continuously 
differentiable,  and  the  minimum  in  the  definition  of  (x* )  is  uniquely  attained  at  s  =  1/2.  We 
can  then  use  Danskin’s  Theorem  (Dan67j  to  obtain 


0,  if  k  ^  i  and  k  ^  j\ 
1/(5),  if  k  =  »  or  k  =  j. 


Consider  any  direction  d  €  f^.d  ^  0,  in  which  x*  can  be  perturbed  without  leaving  the  set  X. 
(That  is,  d  —  (dj ,d2,d3)  with  dx  +  d2  +  <f3  =  0.)  The  chain  rule  yields 


dF„(x*  +  ad) 
da 


r  =  0  fc=l  k  *• 


(A.2) 
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Notice  that  the  assumptions  d  ^  0,  dx  +  d?  +  d3  =  0  imply  that  there  exist  some  i,j  such  that 
di  +  dj  <  0.  Since  i/(l/2)  <  0,  it  follows  that  for  every  choice  of  d ,  the  left-hand  side  of  Eq.  (A.2) 
is  positive  for  some  pair  (»,y).  Thus,  for  each  direction  d,  some  function  Fi}(x)  has  to  increase. 
Taking  Eq.  (A.l)  into  account,  F(z)  must  also  increase.  From  this  point  on,  it  is  only  a  small  step 
to  show  that  F(z)  is  larger  than  F*(z)  in  a  neighborhood  of  x*,  i.e.,  that  x*  is  a  local  minimum. 
(The  details  of  this  last  step  are  omitted.) 
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